[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733

junrushao · 2019-04-18T08:00:19Z

Description

This PR allows users to convert numpy's NDArray without copying content data. This is achieved by converting numpy's NDArray to DLManagedTensor, which is introduced in dmlc/dlpack#38. We expect that this would help users reduce major performance bottleneck when this conversion is conducted frequently.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Rewrite NDArray::FromDLPack to make it resource safe.
Add DLManagedTensor and its deleter via ctypes that allows C++ side deletes it.

Comments

Nothing.

junrushao · 2019-04-19T18:32:31Z

@reminisce @wkcn Could you help review? Thanks!

reminisce

Nice work!

python/mxnet/ndarray/ndarray.py

tests/python/unittest/test_ndarray.py

python/mxnet/ndarray/ndarray.py

wkcn

We can call MXNDArrayFromDLPack rather than MXNDArrayFromDLManagedTensor.

include/mxnet/c_api.h

src/ndarray/ndarray.cc

tests/python/unittest/test_ndarray.py

python/mxnet/ndarray/ndarray.py

junrushao · 2019-04-27T08:43:17Z

@reminisce Hey I have already addressed your comments. Could you review again? Thanks!

wkcn

Great! LGTM. Thanks for your contribution!

reminisce

Thanks for the awesome work!

wkcn · 2019-04-28T23:12:19Z

Merged. Thank you!

szha · 2019-04-28T23:41:29Z

python/mxnet/ndarray/ndarray.py

+
+    if not ndarray.flags['C_CONTIGUOUS']:
+        raise ValueError("Only c-contiguous arrays are supported for zero-copy")
+    c_obj = _make_dl_managed_tensor(ndarray)


when doing zero-copy, should the ownership of the data pointer be transferred to the new ndarray? do we need to update the writable flag? what happens when the original data is updated by numpy?

@szha They share the ownership - which means numpy's modification is transparent to mxnet and vice versa. As requested by @reminisce in his review, I added a testcase for this transparency.

how does it work when using the asynchronous engine? since numpy calls happen immediately, allowing shared ownership may give unexpected results. consider the following case:

a = np.array(...) b = nd.from_numpy(a, zero_copy=True) c = nd.expensive_op(b, in_place=True) d = np.inplace_op(a) # this is called when c hasn't finished yet

given the asynchronous execution, it's hard to tell whether people should expect d to be based on the input before or after expensive_op

(We actually have such discussion long time before, but it's good to make things open and achievable to the community)

My understanding is that your concern is a general question in multi-threading (see the code below). In any multi-threaded system, we may expect this issue to exist -- and MXNet has multi-threaded engine, so this issue exists, right?

We have discussion that how about the content of array is changed somewhere inside the system but user don't know that, should we set up the WRITEABLE flag to be true to prevent users from writing, or should we completely disallow users to access this numpy array?

The question, in the high-level, is all about trade-off. Restricting the freedom of users may make things less error-prone, while giving users full freedom may make the system more powerful.

On one hand, in my humble opinion, restricting the freedom of users, like setting up WRITEABLE flag, does not completely resolve this issue, because users can still encounter pitfalls when they read the array using the numpy side API. The only helps in this case is to completely disable users from read or write the array, which grant them on access only the numpy side - This is more like guaranteeing safety by disallowing users to create threads/processes.

On the other hand, giving users freedom enables them to do more things, and does not necessarily mean that they will definitely make a mistake, because we will encourage to use mxnet's numpy compatibility, which is on the way.

Anyway, I am open to any discussion and open to possible changes setting up several flags on the numpy array.

The API should have clear semantics regardless of users’ knowledge on how MXNet execution works. If you worry about flexibility I’d recommend making it an option to not disable writable flag.

Once we add a zero-copy option to asnumpy where the ownership is completely transferred back to numpy, I think there wouldn't be any need for co-ownership.

Consider the following case:

# the variable `a` is a `mx.nd.NDArray` object a = mx.nd.array([1,2,3]) b = a.asnumpy(zero_copy=True) # users call `asnumpy(zero_copy=True)` twice c = a.asnumpy(zero_copy=True) c[:] = 3 print(a) print(b) a[:] = 5 a.wait_to_read() print(b) print(c)

It's necessary to keep co-ownership.
We can add extra option in from_numpy and asnumpy to prompt users to call the function wait_to_read().
Alternatively, can we add a hook into numpy.ndarray? When users access numpy.ndarray, wait_to_read() will be called automatically.
Reference: https://docs.scipy.org/doc/numpy/reference/arrays.classes.html

If zero-copy is on, then a should not be usable after the asnumpy call. The point is to let the framework deal with all synchronization, and co-ownership doesn't allow that. And for the same case, you can simply copy c from b (the new numpy array) instead of a (the old ndarray), so it's not necessary to keep co-ownership.

The hook in the context seems to refer to our own subclass of numpy array. Creating a new subclass seems like overkill.

For reference, @szha's suggestion was addressed in #14948

* Enable zero-copy from numpy to MXNet NDArray * should work * make lint happy * fix stupid typos * wip to address comments * Address comments * Address comments * Remove redundant code

Enable zero-copy from numpy to MXNet NDArray

c174b2d

junrushao requested a review from szha as a code owner April 18, 2019 08:00

Roshrini added the pr-work-in-progress PR is still work in progress label Apr 18, 2019

junrushao requested a review from anirudh2290 as a code owner April 18, 2019 23:01

should work

38383e7

junrushao force-pushed the zero-from-numpy branch from 5e1a175 to 38383e7 Compare April 19, 2019 04:19

junrushao added 2 commits April 19, 2019 00:27

make lint happy

9fb797b

fix stupid typos

5674684

junrushao changed the title ~~[WIP] Enable zero-copy from numpy to MXNet NDArray~~ Enable zero-copy from numpy to MXNet NDArray Apr 19, 2019

reminisce reviewed Apr 23, 2019

View reviewed changes

wip to address comments

e825e79

wkcn suggested changes Apr 25, 2019

View reviewed changes

include/mxnet/c_api.h Outdated Show resolved Hide resolved

src/ndarray/ndarray.cc Outdated Show resolved Hide resolved

tests/python/unittest/test_ndarray.py Outdated Show resolved Hide resolved

python/mxnet/ndarray/ndarray.py Show resolved Hide resolved

Address comments

ff3e15e

junrushao added 2 commits April 27, 2019 12:17

Address comments

28e4c2e

Remove redundant code

3ca90c9

junrushao changed the title ~~Enable zero-copy from numpy to MXNet NDArray~~ [MXNET-1398] Enable zero-copy from numpy to MXNet NDArray Apr 27, 2019

wkcn approved these changes Apr 27, 2019

View reviewed changes

reminisce approved these changes Apr 28, 2019

View reviewed changes

wkcn added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-work-in-progress PR is still work in progress labels Apr 28, 2019

wkcn merged commit c18381d into apache:master Apr 28, 2019

szha reviewed Apr 28, 2019

View reviewed changes

RogerChern mentioned this pull request May 9, 2019

Future Roadmap Suggestions tusen-ai/simpledet#66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733

[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733

junrushao commented Apr 18, 2019 •

edited

Loading

junrushao commented Apr 19, 2019

reminisce left a comment

wkcn left a comment

junrushao commented Apr 27, 2019

wkcn left a comment

reminisce left a comment

wkcn commented Apr 28, 2019

szha Apr 28, 2019

junrushao Apr 29, 2019

szha Apr 29, 2019

junrushao Apr 29, 2019 •

edited

Loading

szha Apr 29, 2019

szha Apr 29, 2019

wkcn Apr 30, 2019 •

edited

Loading

szha Apr 30, 2019

szha Apr 30, 2019

leezu Jul 23, 2019

[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733

[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733

Conversation

junrushao commented Apr 18, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

junrushao commented Apr 19, 2019

reminisce left a comment

Choose a reason for hiding this comment

wkcn left a comment

Choose a reason for hiding this comment

junrushao commented Apr 27, 2019

wkcn left a comment

Choose a reason for hiding this comment

reminisce left a comment

Choose a reason for hiding this comment

wkcn commented Apr 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Apr 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao commented Apr 18, 2019 •

edited

Loading

junrushao Apr 29, 2019 •

edited

Loading

wkcn Apr 30, 2019 •

edited

Loading